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1.  OVERVIEW 


Project  goals  were  to  improve  the  basic  neuroscientific  understanding  of  imagined  speech  and 
intended  direction  and  to  develop  signal-processing  methods  relevant  to  their  decoding  from 
brain  imaging  data.  We  used  non-invasive  brain-imaging  methods:  electroencephalography 
(EEG),  magnetic  resonance  imaging  (MRI),  and  magneto-encephalography  (MEG).  Project 
research  was  conducted  by  faculty,  post-docs,  graduate  students,  and  undergraduate  students  at 
three  academic  institutions:  the  University  of  California,  Irvine  (UCI),  Carnegie  Mellon 
University  (CMU),  and  New  York  University  (NYU).  Work  at  UCI  focused  on  EEG  studies  of 
imagined  speech  and  of  intended  direction.  Work  at  CMU  focused  on  analysis  of  EEG  data 
collected  at  UCI.  Work  at  NYU  focused  on  MEG  and  fMRI  studies.  This  report  presents 
scientific  results  in  Section  2,  auxiliary  information  in  Section  3,  and  bibliographic  references  in 
Section  4. 

2.  SCIENTIFIC  RESULTS 

The  first  section  presents  results  for  imagined  speech  (Section  2A).  It  covers  work  at  NYU  on  an 
efference  copy  model  for  speech  production,  which  provides  for  the  generation  of  articulatory 
and  auditory  imagery  during  imagined  speech,  and  experimental  evidence  in  its  favor.  It  then 
turns  to  work  with  EEG  at  UCI  and  CMU.  This  work  focuses  primarily  on  the  use  of  machine 
learning  methods  to  determine  circumstances  under  which  EEG  provides  information  about 
imagined  speech. 

The  second  section  presents  results  for  intended  direction  (Section  2B).  Experimental  work  at 
UCI  shows  that  the  direction  of  attention  to  auditory  stimuli  stimulates  brain  networks  similar  to 
those  found  when  directing  attention  to  visual  stimuli.  It  shows  also  that  the  bottom-up  direction 
of  attention  to  a  visual  target  generates  signals  in  EEG  which  may  be  used  to  infer  where  the 
target  is  located  in  the  visual  field.  Finally,  work  with  BCIs  has  led  to  successful  demonstrations 
of  navigation  in  virtual  environments  and  of  robot  remote  control  using  brain  waves. 

2A.  IMAGINED  SPEECH 

Project  results  for  imagined  speech  include  the  development  of  an  efference  copy  model  of 
speech  production  (Tian  &  Poeppel,  2010,  2012).  The  model  describes  the  brain's  generation  of 
the  auditory  and  motor  imagery  that  one  experiences  while  imagining  speech.  Efference  copy 
models  feed  voluntary  motor  commands  back  to  various  brain  centers  so  that  the  predicted 
effects  of  these  voluntary  actions  on  the  organism,  including  changes  in  what  is  sensed,  can 
better  be  taken  into  account.  This  kind  of  model  dates  back  to  Von  Helmholtz  and  was  put  on  a 
firm  footing  by  the  work  of  Von  Holst  and  Mittelstaedt  (1950)  and  Sperry  (1950)  on  motor 
control  and  visuomotor  coordination. 

Efference  copy  model.  The  Tian  &  Poeppel  (2010,  2012)  model  is  shown  in  Figure  1.  Motor 
commands  concerning  speech  production  are  copied  and  sent  to  a  module  that  helps  to  predict 
the  resulting  motor  state.  If  there  is  a  significant  difference  between  sensed  and  predicted  motor 
states  ( e.g .,  a  situation  encountered  when  one  is  talking  with  one's  mouth  full),  then  motor 
planning  can  be  changed  appropriately.  The  motor  efference  copy  also  allows  for  prediction  of 
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the  sensory  effects  of  speech  production.  A  perceptual  efference  copy  is  used  to  predict  what 
will  be  heard  when  speech  motor  actions  are  carried  out.  The  motor  state  estimation  and  sensory 
prediction  processes  are  thought  to  be  associated  with  motor  imagery  and  auditory  perceptual 
imagery,  respectively,  during  imagined  speech  production.  This  assumption  lets  the  efference 
copy  model  link  speech  production  and  imagined  speech  production. 


Figure  1.  Results  of  MEG  studies  by  Tian  &  Poeppel  on  imagined  movement  and  imagined  speech  suggest  that  an 
internal  forward  model  underlies  mental  imagery  of  speech.  The  motor  systems  that  mediate  action  preparation 
carry  out  the  same  functions  in  the  mental  imagery  of  speech,  but  only  perform  motor  simulation,  in  the  sense  that 
the  planned  motor  commands  are  truncated  along  the  path  to  primary  motor  cortex  and  are  not  executed  (the  red 
cross  over  external  outputs).  A  copy  of  such  planned  motor  commands  is  processed  internally  and  is  used  to 
estimate  the  associated  somatosensory  consequences.  A  copy  of  the  somatosensory  estimate  is  sent  on  to 
modality-specific  areas,  and  the  perceptual  consequences  that  would  be  produced  by  the  overt  action  are  estimated. 
Imagery  associated  with  articulator  movement  and  auditory  perception  during  imagined  speech  is  held  by  the 
model  to  be  the  result  of  residual  activity  from  these  internal  estimation  processes  and  is  linked  to  the  absence  of 
cancellation  from  external  feedback  (marked  by  the  red  Xs  over  somatosensory  and  sensory  feedback  pathways). 
After  Tian  &  Poeppel  (2012). _ 


MEG  topographies  found  while  imagining  speech.  The  model  is  supported  by  a  variety  of 
experimental  results  on  speech  using  MEG  (Tian  &  Poeppel,  2010,  2012,  2013,  2015).  One  such 
result  (Tian  &  Poeppel,  2010)  is  shown  in  Figure  2.  MEG  topographies  found  when  imagining 
speech  articulation  or  when  imagining  speech  auditory  perception  resemble  those  found  when 
actually  hearing  speech.  This  provides  nice  evidence  in  favor  of  the  model  shown  in  Figure  1 . 
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Figure  2.  Producing  articulation  imagery  (top  left)  generates  in  MEG  a  topography  (scalp  response  pattern)  that 
differs  from  that  associated  with  overt  articulation  (bottom  topography).  Yet  producing  articulation  imagery  by 
imagining  speaking  leads  shortly  thereafter  (170  msec)  to  a  response  topography  (top  right)  that  resembles  strongly 
that  associated  with  hearing  speech  (middle  right).  Likewise,  when  one  produces  hearing  imagery  by  imagining 
hearing  speech,  the  resulting  topography  (middle  left)  resembles  that  found  when  hearing  actual  speech  (middle 
right). 


Time-frequency  window  for  use  of  speech  production  predictions .  An  MEG  study  by  Tian  and 
Poeppel  (2015)  introduces  further  evidence  in  favor  of  the  model.  Tian  and  Poeppel  argue  that  a 
critical  subroutine  of  self-monitoring  during  speech  production  is  to  detect  any  deviation 
between  expected  and  actual  auditory  feedback.  They  investigated  the  associated  neural 
dynamics  using  MEG  recording  in  mental-imagery-of-speech  paradigms.  Participants  covertly 
articulated  the  vowel  /a/;  their  own  (individually  recorded)  speech  was  played  back,  with 
parametric  manipulation  using  four  levels  of  pitch  shift,  crossed  with  four  levels  of  onset  delay. 
A  non-monotonic  function  was  observed  in  early  auditory  responses  when  the  onset  delay  was 
shorter  than  100  msec.  Suppression  was  observed  for  normal  playback,  but  enhancement  for 
pitch-shifted  playback.  However,  the  magnitude  of  enhancement  decreased  at  the  largest  level  of 
pitch  shift  that  was  out  of  pitch  range  for  normal  conversion,  as  suggested  in  two  behavioral 
experiments.  No  difference  was  observed  among  different  types  of  playback  when  the  onset 
delay  was  longer  than  100  msec. 

These  results  suggest  that  the  prediction  suppresses  the  response  to  normal  feedback,  which 
mediates  source  monitoring  (see  Figure  3).  When  auditory  feedback  does  not  match  the 
prediction,  an  “error  term”  is  generated,  which  underlies  deviance  detection.  Tian  &  Poeppel 
(2015)  suggest  that  a  frequency  window  (addressing  spectral  differences)  and  a  time  window 
(constraining  temporal  differences)  jointly  regulate  the  comparison  between  prediction  and 
feedback  in  speech. 
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Figure  3.  Efference  copies  generated  during  speech  production  generate  internal  predictions  which  may  be 
compared  to  auditory  feedback  to  determine  whether  there  are  errors  in  speech  production  that  need  to  be 
addressed.  Results  of  an  MEG  experiment  suggest  that  the  error  term  is  evaluated  within  a  time/frequency  window 
(Tian  &  Poeppel,  2015). _ 


Dual-stream  model  of  speech  production  prediction.  In  work  as  yet  unpublished,  team 
members  at  NYU  performed  an  fMRI  experiment  in  which  they  establish  that  two  processing 
streams  are  likely  to  underlie  perceptual  prediction  in  imagined  speech  production. 

Subjects  performed  two  speech  imagery  tasks  —  articulation  imagery  (AI)  and  hearing  imagery 
(HI)  —  designed  to  differentially  recruit  the  two  streams.  As  shown  in  Figure  4,  AI  induced 
greater  activity  in  the  simulation-estimation  stream,  including  sensorimotor  cortex,  subcentral 
(BA  43),  middle  frontal  cortex  (BA  46)  and  supramarginal  gyrus  (SMG),  suggesting  more 
recruitment  of  simulation  and  estimation  functions.  Moreover,  AI  showed  more  activation  in 
posterior  superior  temporal  sulcus  compared  with  HI,  suggesting  that  precise  auditory 
representation  can  be  obtained  via  simulation-estimation  mechanisms.  On  the  other  hand, 
distributed  memory  networks,  including  middle  frontal  (BA  8),  inferior  parietal  cortex  and 
intraparietal  sulcus,  were  more  activated  during  HI  compared  to  AI,  suggesting  a  role  for  the 
memory-retrieval  prediction  pathway  in  the  HI  task. 

These  results  demonstrate  that  neural  systems  implement  motor  simulation-estimation  and 
memory  retrieval  as  two  distinct  mechanisms  to  internally  construct  corresponding  perceptual 
outcomes.  These  two  mechanisms  serve  as  a  foundation  for  predicting  perceptual  changes,  either 
via  an  established  causal  relationship  between  actions  and  their  perceptual  consequences  or  via 
stored  perceptual  experiences  of  environmental  regularity. 
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Figure  4.  Dual  stream  prediction  model  (DSPM).  Top:  approximate  cortical  regions  in  the  hypothesized  dual 
streams.  Bottom:  schematic  diagram  of  the  DSPM  (color  scheme  corresponds  to  the  anatomical  locations  above). 
The  abstract  auditory  representations  (orange)  can  be  induced  from  both  bottom-up  and  top-down  processes  and 
are  formed  around  regions  of  STG  and  STS.  The  top-down  induction  process  can  be  carried  out  in  either  the 
memory-retrieval  or  simulation  estimation  prediction  pathway.  The  memory-retrieval  stream  (blue)  includes 
pMTG,  MTL  and  distributed  frontal-parietal  networks,  for  retrieval  from  long-term  lexical  items,  episodic  and 
semantic  memory,  respectively.  The  simulation-estimation  stream  (red)  includes  the  frontal  motor  system  and 
parietal  somatosensory  system.  The  articulatory  trajectory  is  planned  in  frontal  motor  regions,  including  IFG, 
PMC,  INS  and  SMA.  If  covert  production  is  the  goal,  the  planned  articulation  signal  bypasses  Ml  and  is 
simulated  internally.  The  somatosensory  consequence  of  the  simulated  articulation  is  estimated  over  parietal 
somatosensory  regions,  including  SI,  SII  and  SMG.  The  auditory  consequences  —  in  the  form  of  an  abstract 
auditory  representation  -  is  derived  from  the  subsequent  estimation.  A  highly  specified  auditory  representation 
(thick  arrow)  is  obtained  in  the  bottom-up  perceptual  process  that  goes  through  spectrotemporal  analysis  of 
external  stimuli  in  STG  (brown).  The  stream  that  the  motor  simulation  and  perceptual  estimation  processes  are 
available  can  enrich  the  specificity  of  predicted  auditory  representations  (solid  arrows),  compared  to  enrichment 
from  memory  retrieval  stream  (dotted  arrows).  Abbreviations:  STG,  superior  temporal  gyrus;  STS,  superior 
temporal  sulcus;  pMTG,  posterior  middle  temporal  gyrus;  MTL,  middle  temporal  lobe;  IFG,  inferior  frontal 
gyrus;  PMC,  premotor  cortex;  INS,  insula;  SMA,  supplementary  motor  area;  Ml,  primary  motor  cortex;  SI, 
primary  somatosensory  cortex;  SII,  secondary  somatosensory  cortex;  and  SMG,  supramarginal  gyrus.  After  Tian, 
Zarate,  &  Poeppel  (submitted). 
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Imagined  speech  identification.  EEG  experiments  on  speech  production  show  that  one  can  use 
EEG  traces  of  the  heard  or  the  imagined  speech  loudness  envelope  to  determine  which  speech 
stream  one  is  listening  to  or  imagining,  respectively  (Deng  et  al.,  2010).  There  are  also  positive 
results  for  distinguishing  imagined  words  and  sentences  (Lappas,  2011)  and  imagined  phonemes 
(Brigham  &  Vijaya  Kumar,  2010)  on  the  basis  of  information  other  than  loudness  envelope. 
Furthermore,  experimental  results  show  also  that  one  can  use  EEG  traces  of  speech  loudness 
envelopes  to  determine  the  speech  stream  to  which  one  is  paying  attention  (Horton  et  al.,  2011, 
2013,  2014). 

Loudness  envelope  information  in  EEG.  An  early  EEG  experiment  on  imagined  speech  at  UCI 
involved  imagined  sentences  drawn  from  the  TIMIT(sx)  database.  A  pilot  set  of  EEG  data 
collected  in  early  fall  2008  was  analyzed  to  determine  whether  any  electrode  waveforms  bore 
traces  of  the  temporal  envelope  of  the  acoustic  speech  waveform  heard  during  the  cue  period  of 
the  trial.  Such  an  envelope  is  illustrated  in  Figure  5.  Six  sentences  drawn  from  the  timit(sx) 
database  were  used.  Signal  processing  methods  involving  the  Hilbert-Huang  transform  let  one 
detect  the  envelope  in  EEG  signals  when  a  subject  is  listening  to  speech.  This  analysis  was 
applied  to  an  existing,  full  dataset  in  a  study  of  steady-state  auditory  evoked  potentials  (SSAEP). 
When  listening  to  speech  in  this  SSAEP  experiment,  instantaneous  frequencies  extracted  by  a 
Hilbert-Huang  transform  are  correlated  with  the  envelope  of  the  speech  signal  (Deng  & 
Srinivasan,  2010).  The  same  methods  were  used  to  determine  whether  there  were  any  electrodes 
with  waveforms  correlated  to  the  envelope  of  the  acoustic  speech  waveform  during  the  imagined 
speech  period  of  the  trial  in  the  pilot  dataset:  yes.  What  this  means  intuitively  is  that  the  varying 
loudness  of  a  sentence  (pattern  of  stress)  is  echoed  in  EEG  both  when  one  listens  to  that  sentence 
and  when  one  imagines  the  sentence. 
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Figure  5.  The  envelope  of  the  TIMIT(sx)  sentence  number  37,  "critical  equipment  needs  proper  maintenance", 
shows  how  loudness  varies  as  a  function  of  time  for  this  particular  speaker. 

Imagined  speech  rhythm  identification  using  EEG.  Analysis  of  data  from  the  "BaKu" 
experiment  shows  that  one  can  use  EEG  to  determine  the  rhythm  with  which  syllables  are 
produced  in  imagination  (Deng  et  al.,  2010).  An  initial  analysis  of  high-density  EEG  data  from 
the  BaKu  experiment,  in  which  two  syllables  (/ba J  and  /ku/)  were  spoken  in  imagination  in  one 
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of  three  rhythms,  showed  that  information  concerning  imagined  speech  is  present  in  EEG  alpha 
(9-12Hz),  beta  (13-18Hz)  and  theta  (3-8Hz)  bands.  Discovering  in  the  BaKu  data  informative 
spectral  features  within  bands  led  to  follow-on  work  with  the  Hilbert-Huang  transform  (Huang  et 
al.,  1998),  spearheaded  by  former  UCI  grad  student  Siyi  Deng.  This  analysis  focused  on  using 
predictive  classification  of  envelopes  to  decode  the  rhythm  with  which  imagined  syllables  are 
produced.  A  modified  Second  Order  Blind  Identification  (SOBI)  algorithm  was  used  to  help 
enhance  the  signals  and  reduce  dimensionality  (Cardoso,  1998;  Tang  et  al.,  2005).  The  SOBI 
algorithm  uses  the  consistent  temporal  structure  along  multi-trial  EEG  data  to  blindly  decompose 
the  original  recordings  into  a  set  of  neuroanatomically-grounded  components.  These  SOBI 
components  possess  broad  spatial  distributions  across  the  scalp  that  distinguish  left/right, 
front/back,  etc.  In  Deng's  work,  joint  temporal  and  spectral  features  were  extracted  from  the 
Hilbert  spectra  of  selected  SOBI  components,  after  performing  a  Hilbert-Huang  transform. 

Hilbert  spectra  of  empirical  mode  components  provide  accurate  time-spectral  representations  of 
non-stationary  data  that  are  sparser  than  representations  provided  by  conventional  techniques 
like  short-time  Fourier  spectrograms  and  wavelet  scalograms  (see  Figure  6).  Predictive 
classification  of  the  three  rhythms  yields  good  results  for  inter-trial  transfer,  with  performance 
for  all  seven  subjects  lying  at  a  significantly  greater  than  chance  level. 


Tim  «| 


Figure  6.  Comparison  of  the  spectrogram,  wavelet  scalogram  and  Hilbert  spectrum  of  the  same  time  series.  Top 
plot:  Original  time-varying  EEG  signal  from  SOBI  output.  Second  plot:  Short-time  Fourier  transform 
spectrogram.  Third  plot:  Morlet  wavelet  spectrogram.  Bottom  plot:  Hilbert  spectrum,  projected  onto  an  18Hz  by 
384  time-point  grid.  Note  that  the  Hilbert  spectrum  representation  is  considerably  sparser  than  that  of  the  STFT 
and  wavelet  spectrograms. _ 


The  paper  that  describes  the  application  of  the  empirical  mode  decomposition  to  the 
identification  of  BaKu  trial  rhythm  (Deng  et  al.,  2010)  refers  also  to  the  successful  use  of  class- 
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averaged  spectrograms.  An  advantage  of  this  latter  method  is  that  it  can  be  implemented  as  a 
real-time  classification  procedure  for  a  brain-computer  interface.  The  method  finds  the  average 
spectrogram  generated  in  trials  of  various  classes  and  orthogonalizes  these  spectrograms  to 
produce  spectrogram  matched  filters.  Matched  filter  outputs  on  a  particular  trial,  found  by 
computing  a  scalar  product  of  the  matched  filter  spectrogram  and  the  trial  spectrogram,  are  used 
to  classify  the  trial:  the  matched  filter  producing  the  largest  scalar  product  output  identifies  the 
trial  class.  While  the  classification  performance  for  BAKU  rhythm  found  using  class-averaged 
spectrograms  is  poorer  than  that  found  using  the  SOBI/HHT  methods,  the  margin  is  not  wide. 
Furthermore,  its  electrode-by-electrode  analysis  lets  one  identify  informative  electrodes  for 
particular  subjects.  The  identification  provides  no  evidence  for  a  single  set  of  informative  scalp 
locations  common  to  all  subjects. 

The  results  suggest  that  the  rhythmic  structure  of  imagined  syllable  production  can  be  detected  in 
non-invasive  brain  recordings,  and  provide  a  step  toward  the  development  of  an  EEG-based 
system  for  communicating  imagined  speech. 

Deng  and  colleagues  (2010)  reported  negative  results  for  determining  which  of  two  syllables  was 
imagined  in  the  BaKu  experiment  using  empirical  mode  decomposition  or  matched  filter 
methods.  However,  CMU  graduate  student  Kathy  Brigham  and  Kumar  Bhagavatula  succeeded 
in  classifying  /ba/  and  /ku/  trials  (Brigham  &  Vijaya  Kumar,  2010).  They  were  able  to  classify 
imagined  syllables  using  data  preprocessing  methods  which  relied  on  Hurst  exponents  to  remove 
trials  and  electrodes  deemed  to  contain  artifacts  ( e.g .,  electromyographic).  A  large  number  of 
trials  and  electrodes  were  removed.  The  remaining  trials  were  used  to  determine  whether  /ba/ 
and  /ku/  may  be  discriminated  from  one  another  using  EEG.  Results  varied  from  "not  at  all"  to  a 
high  of  88%  classification  success  (vs.  50%  guessing  rate),  depending  on  subject. 

Heard  and  imagined  sentence  identification  using  EEG.  Having  obtained  his  doctoral  degree, 
UC  Irvine  Associate  Specialist  Dr.  Siyi  Deng  completed  a  further,  as  yet  unpublished,  study  on 
whether  EEG  can  be  used  to  determine  which  sentence  someone  listens  to  or  produces  in 
imagination  (Deng  et  al.,  submitted).  The  study  focused  on  the  loudness  envelope  of  speech  (its 
loudness  as  a  function  of  time)  and  its  representation  in  EEG  signals  to  determine  whether 
sentences  can  be  discriminated  on  the  basis  of  the  EEG  representation  of  their  loudness 
envelopes. 

The  experiment  used  both  heard  and  imagined  speech  to  determine  whether  cortical  signatures  of 
heard  speech  can  be  used  to  identify  imagined  speech.  Each  trial  in  the  experiment  presented 
one  of  six  possible  spoken  sentences;  it  was  both  heard  and,  immediately  afterwards,  produced  in 
imagination.  The  analysis  focused  on  the  use  of  envelope  following  responses  (EFRs)  to  identify 
heard  sentences  and  to  identify  imagined  sentences.  Common  to  both  analyses  was  the 
employment  of  source  imaging  methods  to  find  the  cortical  origins  of  EFRs  to  heard  speech. 
Reconstructing  the  EEG  from  the  strongest  sources  of  the  EFRs  in  parietal  and  temporal  cortex, 
shown  in  Figure  7,  improved  the  correlation  between  EEG  and  the  amplitude  envelope  of  the 
heard  speech. 
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Figure  7.  Source  localization  results  for  one  of  the  subjects  show  the  cortical  distributions  of  the  two  strongest 
EEG  components  in  which  are  found  envelope  following  responses  (EFRs).  These  sources  are  illustrated  using  a 
color  scale  in  which  red  indicates  the  greatest  strength.  The  strongest  sources,  found  in  parietal  and  temporal 
cortex,  are  used  to  find  the  cortical  origins  of  EFRs  to  heard  speech  and  to  estimate  the  EEG  generated  by  these 
sources. 


Single-trial  classification  performance  with  the  heard  sentences  was  found  to  be  statistically 
significant  for  two  of  eight  subjects.  Significant  classification  performance  was  found  for  all 
subjects  when  one  used  EEG  data  from  multiple  trials  of  the  same  sentence,  concatenated  to 
produce  data  of  greater  duration.  The  improvement  in  classification  with  heard  speech  duration 
is  shown  in  the  left  panel  of  Figure  8.  In  order  to  classify  EEG  recordings  of  imagined  speech, 
activities  at  the  cortical  sources  determined  for  heard  speech  were  estimated  from  EEG  data 
recorded  while  speech  was  imagined.  Referring  to  the  right  panel  of  Figure  8,  one  sees  that 
classification  performance  with  imagined  sentences  improves  as  the  duration  of  EEG  data 
increases.  About  seven  trials  of  the  same  sentence  are  required  for  classification  of  the  imagined 
sentence  to  reach  statistical  significance.  These  results  suggest  imagining  speech  engages  some 
of  the  cortical  populations  involved  in  perceiving  speech,  as  suggested  by  models  of  speech 
perception  and  production. 


Heard  EEG  classification  for  all  subjects  Imagined  EEG  classification  for  all  subjects 
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Figure  8.  Classification  performance  as  a  function  of  (left  plot)  heard  EEG  data  duration  for  each  of  the  eight 
subjects  and  (right  plot)  imagined  EEG  data  duration  for  each  of  eight  subjects.  Note  the  different  vertical  scales 
for  the  plot  of  heard  sentence  classifiability  (left)  and  imagined  sentence  classifiability  (right).  Each  curve  shows 
the  results  for  one  subject.  Measurements  for  trials  using  the  same  sentence  were  concatenated  to  produce 
segments  of  longer  duration. 
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Imagined  word  and  sentence  identification  using  EEG.  The  final  chapter  of  Dr.  Thom  Lappas' 
dissertation  (Lappas,  2011)  reports  the  results  of  an  experiment  on  the  use  of  EEG  to  determine 
which  of  six  possible  "Coordinate  Response  Measure"  sentences  a  subject  was  imagining.  Six 
sentences  that  vary  in  callsign  and  color  words  were  used:  "Ready  [callsign]  go  to  [color]  now", 
with  three  callsign  words  "baron",  "eagle"  and  "tiger"  and  two  color  words  "red"  and  "green". 
Each  trial  started  with  one  of  the  six  possible  sentences  presented  aloud  (the  cue).  The  cue 
period  was  followed  at  a  set  interval  by  two  clicks  to  help  subjects  produce  the  cued  sentence  in 
imagination  during  the  targeted  production  period  (see  Figure  9).  The  initial  second  in  each  trial 
produced  EEG  that  was  used  to  normalize  the  power  spectrum,  for  each  trial  and  channel.  EEG 
recorded  during  the  trial  period  4.3  -  7.2  sec  were  then  subject  to  sequential  feature  analysis  with 
the  aim  of  determining  which  channels,  times  and  frequencies  are  most  informative  when 
performing  a  six-way  classification  of  single  trials.  Leave-one-out  cross-validation  was  used  to 
determine  classification  performance  rates. 


0.0 


.<■£> 


"Ready  Baron  go  to 
Green  Now." 


"Ready  Baron  goto 
Green  Now." 


1.0 


3.5  3.9  4.3  4.7 


7.2 


Figure  9.  Trial  time  course  for  the  Coordinate  Response  Measure  experiment.  The  cued  sentence  was  heard 
aloud;  these  auditory  stimuli  were  recorded  beforehand  in  the  subject's  own  voice.  After  two  clicks,  subject's  task 
was  to  reproduce  the  cued  sentence  in  imagination  with  the  onset  indicated  above.  EEG  data  were  analyzed  during 
the  period  4.3  -  7.2  sec. 


Six-way  sentence  classification  performance  rates  found  for  each  subject's  daily  data,  using  these 
most  informative  features,  are  highly  significant.  Three  of  the  subjects  had  days  on  which 
performance  rates  exceeded  50%,  a  level  three  times  the  guessing  rate  16%.  Three-way 
classification  of  callsign  words  was  highly  significant  for  all  subjects  and  reached  70%  and  73% 
for  the  top  two  subjects,  which  compares  favorably  to  the  guessing  rate  33%.  Finally,  two-way 
classification  of  color  words  was  highly  significant  for  all  subjects  and  reached  83%  and  84%  for 
the  top  two,  which  compares  favorably  to  the  guessing  rate  50%.  While  the  most  informative 
features  varied  across  subjects  and  days,  aggregating  informative  features  across  subjects  and 
experimental  sessions  shows  that  the  most  informative  channels  tend  to  lie  over  temporal  cortex, 
especially  temporal  cortex  in  the  left  hemisphere  (see  Figure  10). 
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Figure  10.  Locations  and 
frequency  of  occurrence  of 
the  most  informative 
electrodes  found  across  the 
three  classifications  and 
across  all  days  of  data 
collection  for  the  seven 
subjects  in  the  CRM 
experiment.  The  colorbar  at 
right  relates  disk  color  to  the 
frequency  of  occurrence  of  a 
particular  electrode  location 
in  the  aggregate  list  of 
informative  features. 


Use  of  E EG  to  determine  who  somebody  is  listening  to.  As  part  of  his  dissertation  work,  UC 
Irvine  graduate  student  Cort  Horton  studied  the  information  found  in  EEG  while  a  person  listens 
to  one  of  two  or  more  speakers  (Horton  et  al.,  2013).  Recent  studies  report  activity  in  auditory 
cortex  that  is  phase-locked  to  the  envelope  of  speech,  but  it  remains  unclear  whether  this  phase- 
locked  representation  requires  comprehension,  how  it  interacts  with  attention,  and  the  extent  to 
which  it  is  hemispherically  lateralized.  EEG  was  recorded  from  10  adults  while  they  selectively 
attended  to  amplitude-modulated  speech  coming  from  one  speaker,  while  ignoring  speech  from 
another.  Detailed  timing  and  topographic  information  about  the  envelope  representations  was 
extracted  by  cross-correlating  the  attended  and  unattended  stimulus  envelopes  with  each  channel 
of  EEG,  as  well  as  components  produced  via  ICA  decomposition  of  the  data. 

Results  show  that  the  envelopes  of  both  attended  and  unattended  speech  are  encoded  in  the  EEG, 
including  several  lateralized  responses  (see  Figure  11).  However,  there  are  large  differences 
between  the  attended  and  unattended  cross-correlation  functions.  In  addition,  trial  stimuli  were 
amplitude-modulated  at  40  and  41  Hz  in  order  to  induce  steady-state  responses.  Those  steady- 
state  responses  were  not  affected  by  attention.  The  data  suggest  that  mechanisms  for  selective 
attention  to  one  of  multiple  speakers  involves  frequency-band  limited  enhancement  and 
suppression ,  as  well  as  a  modulation  of  the  phase  of  endogenous  theta  activity  in  auditory  cortex 
to  align  high-excitability  periods  with  attended  speech  syllables. 
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Figure  11.  Envelope-EEG  cross¬ 
correlations.  Plots  of  the  grand- 
averaged  correlation  between  each 
channel  of  EEG  with  the  stimulus 
envelopes  as  a  function  of  lag. 
Each  trace  represents  a  separate 
channel  of  EEG.  The  dashed  lines 
indicate  the  0.5th  and  99.5th 
percentile  of  correlation  values 
observed  in  the  control  condition. 
Several  delays  in  the  cross¬ 
correlation  functions  are  further 
illustrated  with  topographic  plots 
underneath.  Warm  colors  denote 
correlations  with  positive 
potentials,  while  cool  colors  denote 
correlations  with  negative 
potentials. 
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Horton  and  colleagues  (2015)  extended  the  result  by  investigating  the  accuracy  with  which  a 
subject’s  locus  of  attention  during  a  “cocktail  party”  task  can  be  ascertained  from  speech 
loudness  envelope  responses  present  within  single  trials  of  EEG.  It  was  found  that  the  attended 
speaker  can  be  determined  reliably  from  short  periods  of  EEG,  with  accuracy  improving  as  a 
function  of  trial  length.  Furthermore,  the  performance  of  this  speech  loudness  envelope-based 
attention  classifier  was  compared  to  others  based  on  changes  in  steady-state  responses  (elicited 
via  40  and  41  Hz  amplitude  modulations  of  the  speech)  and  hemispheric  lateralization  of  alpha 
power.  We  found  that  the  neural  responses  to  the  speech  loudness  envelopes  were  far  more 
robust  indicators  of  attention  than  the  others.  Figure  12  shows  that  the  use  of  EEG  identified  as 
informative  through  cross-correlation  with  speech  stimulus  envelopes  (leftmost  panel)  leads  to 
strong  classification  results,  while  the  use  of  alpha  lateralization  indices  (middle  panel)  or 
auditory  steady-state  response  magnitudes  (rightmost  panel)  do  not.  These  results  suggest  that 
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envelope-related  signals  recorded  in  EEG  data  can  be  used  to  form  robust  auditory  BCIs  that  do 
not  require  artificial  manipulation  (e.g.,  amplitude  modulation)  of  stimuli  in  order  to  function. 


Cross-Correlations  Alpha  Lateralization 


ASSR  Magnitudes 


Figure  12.  Classification  accuracy  is  plotted  as  a  function  of  EEG  sample  length  for  each  of  three  different 
classifiers  that  make  use  of  different  features  in  the  EEG.  The  mean  accuracies  for  each  subject  are  indicated  by 
the  individual  data  points,  with  the  subject  mean  indicated  by  the  black  line.  Chance  is  marked  with  the  dashed 
line  at  50%  classification  accuracy.  Significantly  above-chance  accuracy  values  are  marked  by  blue  circles,  while 
non-significant  values  are  indicated  by  red  crosses.  Measures  of  the  hemispheric  lateralization  of  alpha  power 
provide  poor  classification  accuracy  (middle  plot),  as  do  magnitudes  of  auditory  steady-state  responses  to 
modulations  at  40  and  41Hz  (right  plot).  Cross-correlations  between  EEG  and  speech  loudness  envelopes  provide 
a  robust  means  to  extract  information  that  produces  accurate  determination  of  whether  one  is  covertly  paying 
attention  to  the  speaker  heard  through  the  left  ear  or  the  speaker  heard  through  the  right  ear. 


2B.  INTENDED  DIRECTION 

Project  research  on  intended  direction  included  work  on  covert  attention  that  bears  some 
similarities  to  the  just-discussed  work  on  the  direction  of  attention  to  particular  speech  streams. 
The  difference  is  that  the  studies  to  be  discussed  in  this  section  necessitate  the  direction  of 
attention  in  a  particular  spatial  direction.  An  overt  shift  of  attention  is  accompanied  by  eye, 
head,  and  body  movements.  Covert  shifts  are  not  evident.  Yet  covert  shifts  have  been  shown  for 
decades  to  influence  performance:  enhance  speed  and  accuracy  in  tasks  involving  stimuli  that  lie 
in  the  covertly-attended  direction. 

An  experiment  with  EEG  showed  that  one  can  use  EEG  to  determine  whether  one  is  attending  to 
a  speech  stream  at  left  or  one  at  right  (Thorpe  et  al.,  2011).  Results  suggest  that  attention  to 
auditory  stimuli  activates  brain  networks  similar  to  those  activated  during  the  direction  of  visual 
attention.  A  study  of  covert  attention  to  visual  stimuli  used  EEG  to  perform  a  visual  perimetry 
based  on  the  bottom- up  direction  of  attention  (Coleman,  2014).  Results  not  only  replicate  earlier 
ones  for  the  lateralization  of  EEG  response  in  occipital  cortex  based  on  target  presence  in  left  or 
right  visual  hemifield,  but  also  demonstrate  a  vertical  gradient  in  occipital  cortical  EEG  activity 
that  depends  on  visual  target  vertical  position. 
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EEG  signals  concerning  the  direction  of  attention  to  auditory  stimuli.  UCI  graduate  student 
conducted  ARO  MURI-supported  work  as  part  of  his  dissertation  research  (Thorpe  el  al.,  2011). 
A  cued  spatial  attention  experiment  was  conducted  to  investigate  the  time-frequency  structure  of 
human  EEG  induced  by  attentional  orientation  of  an  observer  in  external  auditory  space. 
Attention  was  cued  to  one  of  two  spatial  locations,  at  left  and  right,  respectively  (see  Figure  13). 
Subjects  were  instructed  to  report  the  speech  stimulus  at  the  cued  location  and  to  ignore  a 
simultaneous  speech  stream  originating  from  the  uncued  location.  Analysis  used  wavelet  spectra 
to  normalize  response  in  each  EEG  frequency  band  by  the  mean  level  observed  in  the  early  part 
of  the  cue  interval,  with  the  aim  of  measuring  induced  power  related  to  the  deployment  of 
attention. 
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Figure  13.  The  experiment  design  in  the  work  by  Thorpe  et  al.  (201 1).  Above  is  shown  the  physical  layout  of  the 
experiment.  Each  speaker  was  45  deg  away  from  fixation  and  could  not  be  seen  by  an  observer  without  moving 
the  eyes.  Below  is  shown  the  trial  time  course.  In  the  example  shown,  the  subject  is  cued  to  attend  to  the  left 
speaker.  After  a  variable  interstimulus  interval  (ISI,  from  500-1500  msec),  two  different  sentences  are  played 
through  the  speakers.  The  subject's  task  is  to  indicate  the  codeword  and  color  word  played  from  the  cued  speaker. 
The  volume  of  the  uncued  speaker  is  high  enough  to  necessitate  strict  attention  to  sound  in  the  direction  of  the  cued 
speaker. 


Topographies  of  band-specific  induced  power  during  the  cue  and  inter- stimulus  intervals  showed 
peaks  over  the  symmetric  bilateral  scalp  areas  (see  Figure  14  for  the  alpha  power  band  result). 
Results  suggest  that  the  deployment  and  maintenance  of  spatially-oriented  attention  through  a 
period  of  1100  msec  is  marked  by  distinct  episodes  of  reliable  hemispheric  lateralization 
ipsilateral  to  the  direction  in  which  attention  is  oriented.  An  early  theta  lateralization  was 
evident  over  posterior  parietal  electrodes  and  was  sustained  through  the  interstimulus  interval.  In 
the  alpha  and  mu  bands,  punctuated  episodes  of  parietal  power  lateralization  were  observed 
roughly  500  msec  after  attentional  deployment,  consistent  with  previous  studies  of  visual 
attention.  In  the  beta  band,  these  episodes  show  similar  patterns  of  lateralization  over  frontal 
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motor  areas.  These  results  indicate  that  spatial  attention  involves  similar  mechanisms  in  the 
auditory  and  visual  modalities. 


Figure  14.  Induced  alpha  band  power  is  shown  averaged  over  the  cue  and  interstimulus  intervals  for  left-cued 
(LC)  and  right-cued  (RC)  attention  conditions,  respectively.  Two  pairs  of  symmetric  channel  groups  appear  as 
peaks  in  the  resultant  power  band  topographies.  These  are  situated  over  bilateral  motor  (pink/blue  dots  for  LC/RC, 
respectively)  and  parietal  areas  (magenta/cyan  dots  for  LC/RC,  respectively). _ 


Use  of  EEG  to  track  bottom-up  visual  attention  in  two  dimensions.  UCI  graduate  student 
Robert  Coleman  studied  visual  attention  in  his  MURI- supported  dissertation  work  (Coleman, 
2014).  His  results  suggest  that  EEG  may  be  used  to  discern  the  covert  direction  of  visual 
attention  in  one  or  two  dimensions.  Subjects  in  an  EEG  experiment  sat  in  front  of  a  monitor  and 
maintained  fixation  on  the  center  of  the  monitor’s  screen  while  directing  attention  covertly  to  a 
sequence  of  target  letters.  For  each  presentation,  the  subjects  responded  with  a  button  press  to 
indicate  target  identity.  Visual  targets  were  the  single  letters  A,  F,  H,  or  L.  Each  letter  was 
presented  for  1500  msec  and  was  followed  immediately  by  the  next  letter.  Target  locations 
varied  in  three  different  ways  to  provide  three  experimental  conditions: 

1.  the  targets  varied  in  horizontal  position,  with  positions  drawn  from  a  uniform  distribution 
on  the  range  [-19.2,  19.2]  degrees  of  visual  angle  (deg);  these  were  vertically  centered 
and  varied  in  azimuth; 

2.  the  targets  varied  in  vertical  position,  with  positions  drawn  from  a  uniform  distribution 
on  the  range  [-19.2,  19.2]  deg;  these  were  horizontally  centered  and  varied  in  elevation; 

3.  target  position  varied  in  two  dimensions,  with  positions  drawn  from  a  uniform 
distribution  on  the  square  [-19.2,  19.2]  deg  x  [-19.2,  19.2]  deg  in  the  center  of  the  display. 

Analysis  used  common  spatial  pattern  features  from  64  channel  EEG  measured  within  alpha  (8- 
12Hz),  low-beta  (16-20  Hz)  and  high-beta  (22-26Hz)  bands  and  within  350-600  msec  of  target 
presentation  onset  (see  figure  15).  To  determine  how  well  EEG  can  be  used  to  determine  letter 
location,  the  range  of  letter  locations  was  divided  into  two,  three,  or  five  equal- sized  intervals  for 
the  horizontal  and  vertical  conditions,  and  was  divided  into  four  or  nine  equal-sized  square 
sectors  for  the  2D  condition.  Support  vector  machine  and  50-tree  random  forest  classifiers  were 
used  to  classify  trials  from  two  subjects  according  to  target  letter  position.  Classifications  into 
two,  three  or  five  sectors  in  horizontal  and  vertical  conditions  were  statistically  significant  for 
both  subjects;  classifications  into  four  or  nine  sectors  in  the  2D  condition  were  also  significant. 
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Figure  15.  The  topographies  of  random  forest  classifier  permutation  feature  importance  indicate  that  right 
lateralized  occipito-parietal  alpha  and  low-beta  power  are  the  predominant  features  used  to  reliably  predict  covert 
visual  attention  location  along  the  azimuth.  Data  from  the  horizontal  condition  and  the  2D  condition  were  used  to 
generate  these  topographies.  A  similar  analysis  suggests  that  central  occipito-parietal  low-beta  power  is  the  most 
important  feature  when  identifying  vertical  location. 


The  overall  result  is  that  EEG  can  be  used  to  passively  detect  the  spatial  position  of  visual 
attention,  at  varying  degrees  of  precision,  as  a  person  attends  to  objects  they  see. 

EEG-based  BCI  for  movement  control.  In  a  first  stage  of  this  project,  software  for  the  action 
game  Quake  2  was  modified  to  receive  and  act  on  movement  control  signals  provided  by  EEG 
signal-processing  software.  The  EEG  signal-processing  software  was  trained  to  recognize  neural 
activity,  generated  by  the  user  of  the  brain-computer  interface,  meant  to  signal  turn  left,  turn 
right,  go,  or  stop.  A  naive  Bayes  classifier  was  used  to  process  data  from  a  brief  training 
experiment  in  which  the  user  signaled  these  four  categories  without  any  movement.  We  were 
able  to  demonstrate  successful  navigation  through  a  Quake  2  game  level  using  brain  waves  with 
these  methods. 

We  next  extended  these  methods  to  handle  robot  control.  Graduate  research  assistant  Zack 
Wisti,  aided  by  Associate  Specialist  John  Hagedorn  and  by  D'Zmura,  implemented  robot  romote 
control  by  mobile  human  subjects.  A  custom-built,  portable,  tetherless,  64-electrode  EEG 
system  by  ANT  and  acquired  using  DURIP  funding  is  worn  in  a  backpack  by  a  mobile  human 
subject.  A  laptop  computer  within  the  backpack  performs  EEG-based  brain-computer  interface 
signal  processing.  The  signal-processing  computer  is  connected  wirelessly  to  a  computer 
onboard  the  robot  (see  Figure  16). 

A  simple  navigational  scheme  was  used  to  control  turns  to  left  and  right  and  to  control  stopping 
and  starting.  Measurements  of  alpha  power  in  left  hemisphere  and  right  hemisphere  electrodes, 
respectively,  are  used  for  control.  The  measured  difference  in  alpha  power  between  left  and 
right  hemispheres  is  used  to  cause  either  left  or  right  turns  by  the  robot.  Likewise,  the  total  alpha 
power  in  left  and  right  hemispheres  is  used  to  modulate  speed.  A  high  alpha  power  signals 
relaxation  or  closed  eyes  and  is  a  natural  signal  for  stopping.  Reduction  in  alpha  due  to  alertness 
causes  the  robot  to  start  moving  forward  again. 
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Figure  16.  Mobile  subject  Hagedorn  with  red  EEG  gel  cap  controls  navigation  of  robot  (center).  A  signal¬ 
processing  computer  within  the  backpack  worn  by  the  subject  communicates  wirelessly  with  a  computer  onboard 
the  robot.  EEG  signals  are  processed  by  the  computer  in  the  backpack  to  control  the  robot's  left/right  turns  and 
starting/stopping.  Subject's  task  is  to  cause  the  robot  to  navigate  a  full  lap  of  a  slalom  course  marked  by  orange 
pylons. 
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